Skip to content

UPSTREAM PR #19531: Kimi Linear (correct conv state update + block implementation)#1165

Open
loci-dev wants to merge 94 commits intomainfrom
loci/pr-19531-Kimi-Linear
Open

UPSTREAM PR #19531: Kimi Linear (correct conv state update + block implementation)#1165
loci-dev wants to merge 94 commits intomainfrom
loci/pr-19531-Kimi-Linear

Conversation

@loci-dev
Copy link
Copy Markdown

Note

Source pull request: ggml-org/llama.cpp#19531

Make sure to read the contributing guidelines before submitting a PR

The current implementation has incorrect conv state update such that it has state corruption when running parallel in llama-server. This is fixed in this PR.

./build/bin/llama-server -c 16384 --parallel 8 --mmap -m ~/Kimi-Linear-48B-A3B-Instruct-GGUF/Kimi-Linear-48B-A3B-Instruct-jp-imatrix.IQ3_M.gguf -ngl 100

This PR also includes the block implementation that speeds up 20% pp and VRAM saving.

@loci-dev loci-dev force-pushed the main branch 12 times, most recently from f9aec49 to e6c519b Compare March 22, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 9 times, most recently from 135dbe7 to 89a1190 Compare March 29, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 9 times, most recently from 6ef937b to 3655621 Compare April 5, 2026 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants